5320 DATA
VISUALIZATION PROJECT
PROPOSAL
Teammates:
Sai Teja Talluri
Sai Ritwik Reddy Nagireddy
Swetha Vahana Reddy Morreppa
Jaswanthi Chowdary Kalapala
Introduction
Domain:
Retail Inventory Management
The retail industry relies heavily on effective inventory management to
ensure optimal stock levels, meet customer demand, and maximize
profitability.
Data visualization plays a crucial role in understanding inventory trends,
analyzing sales data, and making informed decisions regarding stock
replenishment and marketing strategies.
Workflow
Data Collection:
Getting relevant datasets from Kaggle, a popular platform for data science datasets.
Initial Visualization:
Utilizing D3.js to visualize the uncleaned dataset, providing an initial overview of the data's structure and
potential insights.
Data Cleaning:
Using Python programming language and libraries like Pandas and NumPy to clean the dataset, handling
missing values, outliers, and inconsistencies.
Refined Visualization:
Using Python's data visualization libraries such as Matplotlib or Seaborn to create more refined
visualizations based on the cleaned dataset, focusing on specific variables of interest.
Dashboard Creation:
Utilizing Microsoft Power BI to design interactive dashboards and reports, incorporating the refined
visualizations to provide a comprehensive view of the analyzed data.
Report Generation:
Creating detailed reports summarizing the analysis findings, insights, and conclusions drawn from the
visualizations and data analysis process.
Data Abstraction-Dataset Type and Attributes:
Sales Report
Product ID (Numeric): Unique identifier for each product.
Product Name (String): Name of the product.
Category (Categorical): Category to which the product belongs (e.g., electronics, clothing,
groceries).
Sales Quantity (Numeric): Quantity of the product sold.
Stock Levels (Numeric): Current stock levels of the product.
Price (Numeric): Unit price of the product.
Size (Categorical/Numeric): Size variation of the product (e.g., small, medium, large) or
numerical size.
Color (Categorical): Color variation of the product.
Catalog Category (Categorical): Category used for cataloging purposes.
Platform (Categorical): Platform where the product was sold (e.g., online, in-store).
Weight (Numeric): Weight of the product.
Total Price (Numeric): Total revenue generated from sales of the product.
Maximum Retail Price (MRP) (Numeric): Maximum retail price of the product.
May 2022 Inventory Data
Product ID (Numeric): Unique identifier for each product.
Product Name (String): Name of the product.
Category (Categorical): Category to which the product belongs.
Stock Levels (Numeric): Current stock levels of the product.
Size (Categorical/Numeric): Size variation of the product.
Color (Categorical): Color variation of the product.
Catalog Category (Categorical): Category used for cataloging purposes.
Price (Numeric): Unit price of the product.
Total Price (Numeric): Total value of the stock of the product.
Maximum Retail Price (MRP) (Numeric): Maximum retail price of the product.
Detailed Description of Dataset
The dataset comprises two main components: sales reports and inventory data.
Sales Report: Contains information about products sold, including their
identifiers, names, categories, sales quantities, stock levels, prices, size,
color, catalog categories, sales platforms, weight, total revenue generated,
and maximum retail prices.
Inventory Data (May 2022): Provides details on the inventory status of
products as of May 2022, including product identifiers, names, categories,
stock levels, size variations, color variations, catalog categories, prices, total
stock values, and maximum retail prices.
Data Transformation:
Handling Missing Values:
Identify missing values in each attribute.
Impute missing values for numerical attributes using mean or median.
Impute missing values for categorical attributes using the mode.
Encoding Categorical Variables:
Perform one-hot encoding for categorical variables like Category, Size, Color, Catalog Category, and
Platform.
Scaling Numerical Features:
Apply standardization or min-max scaling to numerical features like Sales Quantity, Stock Levels,
Price, Weight, Total Price, and MRP.
Additional Preprocessing Steps:
Perform outlier detection and handling.
Normalize data if necessary.
Split data into training and testing sets.
Task Abstraction
Task (Target and Actions):
The primary objective of the task is to analyze retail sales and inventory data to derive actionable insights for inventory
management and business decision-making.
Target:
To identify sales trends, product performance, and inventory levels across different categories, sizes, and colors.
To understand customer preferences, pricing dynamics, and market trends.
Actions:
Analyze sales reports to identify top-selling products, popular categories, and seasonal trends.
Evaluate inventory data to assess stock levels, identify slow-moving items, and optimize inventory turnover.
Visualize sales data using charts and graphs to understand sales distribution by category, size, and color.
Utilize advanced analytics techniques to identify correlations between sales, pricing, and inventory levels.
Generate reports and dashboards to communicate findings and recommendations to stakeholders.
Implement data-driven strategies for inventory management, pricing optimization, and product assortment planning.
Implementation using Tools
Data Visualization:
D3.js: Initial visualization of uncleaned dataset to provide an overview.
Python Libraries (Matplotlib, Seaborn): Creating refined visualizations based on cleaned
dataset.
Data Cleaning and Analysis:
Python: Utilized for data cleaning, preprocessing, and analysis.
Pandas: Handling datasets and performing data manipulation tasks.
NumPy: Used for numerical computations and array operations.
Dashboard Creation:
Microsoft Power BI: Designing interactive dashboards and reports incorporating refined
visualizations.
Report Generation:
Microsoft Power BI: Creating detailed reports summarizing analysis findings and insights.
Results for
Analysis
Bar Chart Visualization
Using D3.js(pre-cleaning)
Explanation:
The bar chart shows how many products are in each category.
Each bar represents a category, and its height shows how many products are in
that category.
When you hover over a bar, it tells you the category name and the number of
products.
Story:
Imagine you have a store with different types of products.
The bar chart helps you see which types are the most popular based on how
many products are in each category.
For example, if you see a tall bar for “Kurta," it means you have a lot of Kurta
products in your store.
Pie Chart
Visualization:
Explanation:
The pie chart shows the proportion of products in each category compared to the total.
Each slice of the pie represents a category, and its size shows the proportion of products in
that category.
When you hover over a slice, it tells you the category name and what percentage of products
it represents.
Story:
Using the pie chart, you can quickly see which categories make up the biggest parts of your
store.
For instance, if the “kurta" slice is the biggest, it means kurtas make up a large portion of
your products.
It helps you understand the overall composition of your store's inventory.
Size-wise Sales Distribution:
Understanding Product Demand
Visualization Explanation:
This visualization displays the distribution of product sales by
size category. Each bar represents the number of products sold
in a particular size, with different colors distinguishing
between sizes. Users can hover over bars to see the exact
count of products for each size, providing a quick overview of
sales distribution.
Small Story:
Imagine you're a clothing store manager eager to understand
which sizes of products are selling the most. With this
visualization, you can easily see which sizes are most popular
among your customers at a glance.
By hovering over each bar, you can get precise numbers,
helping you make informed decisions about inventory
management and future product ordering.
This simple yet effective tool helps you optimize your stock
and cater better to your customers' preferences.
Color-coded Product Sales: Visualizing
Market Preferences
Visualization Explanation:
This visualization depicts the distribution of product sales based on color categories.
Each segment of the pie chart represents a color, with the size of the segment
corresponding to the number of products sold in that color.
Users can hover over each segment to view the exact count of products sold, providing
insight into consumer color preferences.
Story:
Imagine you're a retailer looking to understand which colors are most popular among your
customers.
This visualization helps you visualize sales trends by color, allowing you to identify which
colors are in high demand.
By hovering over each segment, you can see the specific sales figures for each color,
enabling you to adjust your inventory and marketing strategies accordingly to meet
consumer preferences.
Category-wise Product Distribution: Analyzing
Market Segmentation
Visualization Explanation:
This visualization illustrates the distribution of products across different
categories.
Each bar represents a product category, with the height indicating the
number of products within that category.
The color of each bar distinguishes between categories, allowing for easy
visual differentiation.
Users can interpret the chart to understand the relative popularity of
different product categories.
Story:
Picture yourself as a retail manager seeking insights into which product
categories drive the most sales.
With this visualization, you can easily identify the distribution of products
across various categories.
By observing the height of each bar, you can gauge the popularity of
different product types within your inventory.
This information empowers you to tailor marketing strategies and allocate
resources effectively to maximize sales and customer satisfaction.
After Cleaning Dataset- Using Python-Stock Levels
by Category: Inventory Analysis"
Visualization Explanation:
This bar chart visualizes the stock levels of different product categories.
Each bar represents a category, and its height indicates the corresponding
stock level.
By analyzing the chart, viewers can quickly grasp the distribution of stock
across various categories.
The x-axis denotes the product categories, while the y-axis represents the
stock levels.
Story:
Imagine you're a warehouse manager responsible for overseeing inventory
levels across different product categories.
This visualization offers a clear overview of current stock levels by
category.
By observing the heights of the bars, you can identify which categories
have higher or lower stock levels, helping you make informed decisions
regarding inventory management, restocking priorities, and resource
allocation.
This visual insight enables you to maintain optimal stock levels and ensure
smooth operations.
Stock Levels by Size: Inventory Analysis"
Visualization Explanation:
This bar chart illustrates the stock levels of products categorized by size.
Each bar represents a specific size category, and its height indicates the corresponding
stock level.
By examining the chart, viewers can easily discern the distribution of stock across
different sizes.
The x-axis displays the size categories, while the y-axis represents the stock levels.
Story:
Imagine you're a retail manager tasked with managing inventory levels for various
product sizes.
This visualization provides valuable insights into stock levels by size category.
By observing the heights of the bars, you can quickly identify which sizes have higher or
lower stock levels, aiding in inventory planning and restocking decisions.
This visual representation empowers you to optimize stock management strategies and
ensure adequate availability of products across different sizes.
"Top 10 Colors by Stock Levels: Inventory Analysis"
Visualization Explanation:
This horizontal bar plot displays the stock levels of the top 10 colors based on their
cumulative stock levels.
Each bar represents a color, with the length indicating the corresponding stock level.
colors are arranged in descending order of stock levels, allowing viewers to quickly
identify the colors with the highest stock levels.
The x-axis represents the stock levels, while the y-axis denotes the color categories.
Story:
Imagine you're a product manager interested in understanding which colors have the
highest stock levels in your inventory.
This visualization highlights the top 10 colors with the most stock, providing valuable
insights into color popularity and inventory distribution.
By examining the lengths of the bars, you can identify the dominant colors in your
inventory and adjust your marketing and sales strategies accordingly.
This visual analysis helps you optimize stock management and meet customer demand
effectively.
Distribution of Products by Category: Market
Analysis
Visualization Explanation:
This count plot visualizes the distribution of products across different categories.
Each bar represents a product category, and its height corresponds to the count of
products within that category.
By examining the chart, viewers can easily grasp the relative distribution of products
among various categories.
The x-axis denotes the product categories, while the y-axis represents the count of
products.
Story:
Imagine you're a retail analyst interested in understanding the product distribution
across different categories in your store.
This visualization offers a clear overview of how products are distributed among
various categories.
By observing the heights of the bars, you can quickly identify which categories have a
higher or lower product count, aiding in inventory management and marketing
strategies.
This visual insight enables you to optimize product assortment and cater effectively to
customer preferences.
Distribution of Products by Catalog Category (May
2022)"
Visualization Explanation:
This count plot illustrates the distribution of products across catalog categories
specifically for May 2022.
Each bar represents a catalog category, with the height indicating the count of
products within that category.
By analyzing the chart, viewers can gain insights into how products are
distributed among different catalog categories during May 2022.
The x-axis denotes the catalog categories, while the y-axis represents the count
of products.
Story:
As a marketing analyst analyzing sales data for May 2022, understanding the
distribution of products across catalog categories is crucial.
This visualization provides a clear overview of product distribution within specific
catalog categories during that month.
By examining the heights of the bars, you can identify which catalog categories
had higher or lower product counts, helping you tailor marketing strategies and
optimize inventory management for future months.
This visual analysis aids in understanding market trends and making data-driven
decisions to drive business growth.
Price Distribution Across Different
Platforms (May 2022)"
Visualization Explanation:
This box plot compares the distribution of prices across different platforms for
products in May 2022.
Each box represents the distribution of prices for a specific platform, providing
insights into the variability and spread of prices.
The horizontal orientation of the plot facilitates easy comparison of price
distributions across platforms.
The x-axis denotes the price values, while the y-axis represents the platforms.
Story:
For retailers and analysts, understanding price distribution across various
platforms in May 2022 is vital for pricing strategies and competitive analysis.
This visualization presents a comprehensive view of price distributions, allowing
comparison of pricing strategies across different platforms.
By examining the box plots, one can discern the range and variability of prices
for each platform, aiding in decision-making related to pricing adjustments and
competitive positioning.
This visual insight empowers retailers to adapt pricing strategies effectively and
stay competitive in the market.
Distribution of Prices and Weights (May 2022)"
Visualization Explanation:
This box plot compares the distribution of prices (TP and MRP Old) and weights
for products in May 2022.
Each box represents the distribution of values for a specific variable, allowing for
insights into the variability and spread of prices and weights.
The horizontal orientation of the plot facilitates easy comparison of distributions.
The x-axis denotes the value of prices or weights, while the y-axis represents the
variables (prices or weights).
Story:
In May 2022, understanding the distribution of both prices and weights of
products is crucial for retailers and analysts.
This visualization offers a comprehensive view of the distributions, enabling
comparison between prices and weights.
By examining the box plots, one can discern the range and variability of prices
and weights, aiding in decision-making related to pricing strategies and inventory
management.
This visual insight empowers retailers to optimize pricing strategies and product
offerings effectively.
"Distribution of TP and MRP Before and After
Data Cleaning (May 2022)"
Visualization Explanation:
These histograms compare the distribution of TP (Total Price) and MRP (Maximum Retail Price)
before and after data cleaning for May 2022.
Each subplot represents the distribution of values for TP or MRP, with the original data shown in blue
and the cleaned data shown in orange.
The histograms display the frequency of occurrence of TP and MRP values, with a kernel density
estimate overlaid for smoother visualization.
Story:
Before making business decisions based on pricing data for May 2022, it's essential to understand the
impact of data cleaning on the distribution of TP and MRP.
These histograms offer a visual comparison of TP and MRP distributions before and after data
cleaning.
By observing the histograms, one can assess any changes in the distribution patterns post-cleaning.
This visual analysis helps ensure data integrity and reliability, enabling informed decision-making
regarding pricing strategies and market positioning.
"Correlation Matrix (Cleaned May 2022 Data)"
Visualization Explanation:
This heatmap displays the correlation matrix for Total Price (TP), Maximum Retail Price
(MRP Old), and Weight variables in the cleaned May 2022 data.
Each cell in the heatmap represents the correlation coefficient between two variables,
with color intensity indicating the strength and direction of correlation.
Positive correlations are shown in warmer colors (e.g., red), while negative correlations
are shown in cooler colors (e.g., blue).
Values closer to 1 or -1 indicate stronger correlations, while values closer to 0 indicate
weaker correlations.
Small Story:
Understanding the relationships between Total Price, Maximum Retail Price, and Weight
variables in the cleaned May 2022 data is crucial for retailers and analysts.
This heatmap provides a visual representation of the correlations between these
variables.
By analyzing the heatmap, one can identify any significant correlations that may exist
between pricing and weight attributes.
This insight helps in understanding how changes in one variable may affect others,
aiding in pricing strategies and product positioning decisions.
Distribution of Products by Catalog Category (May
2022)
Visualization Explanation:
This bar plot illustrates the distribution of products across catalog categories in May 2022.
Each bar represents a catalog category, with the height indicating the count of products within that
category.
The colors of the bars are chosen from the 'viridis' palette for enhanced visual appeal.
The x-axis denotes the catalog categories, while the y-axis represents the count of products.
Story:
Analyzing the distribution of products across catalog categories for May 2022 is essential for
understanding product assortment and market segmentation.
This visualization provides a clear overview of how products are distributed among different catalog
categories.
By examining the heights of the bars, one can identify which catalog categories have higher or lower
product counts, aiding in inventory management and marketing strategies.
This visual insight enables retailers to optimize product offerings and cater effectively to customer
preferences.
"Distribution of MRP by Category (May 2022)"
Visualization Explanation (MRP):
This box plot compares the distribution of Maximum Retail Price (MRP) across different
product categories in May 2022.
Each box represents the distribution of MRP values for a specific category, enabling insights
into the variability and spread of prices within each category.
The x-axis denotes the product categories, while the y-axis represents the MRP values.
Story (Both Visualizations):
Analyzing the distribution of TP and MRP across different product categories for May 2022
provides valuable insights into pricing strategies and market segmentation.
These visualizations offer a clear comparison of price distributions among various categories,
aiding in understanding pricing dynamics and consumer preferences within each category.
By examining the box plots, retailers can identify pricing trends, outliers, and potential areas
for price adjustments to optimize revenue and competitiveness within specific product
categories.
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Microsoft Power BI
Work Management
Work Completed:
Data Acquisition and Cleaning:
Description: Bar Chart Visualization Using D3.js(pre-cleaning).
Responsibility: All team members.
Contributions: 100% by each member.
Initial Visualization Using D3.js:
Description: Created a pie chart showing the proportion of products using D3.js.
Responsibility: Sai Teja Talluri, Swetha Vahana Reddy Morreppa.
Contributions: 50% each.
Understanding Product Demand :
Description: Developed a bar chart showing the distribution of product sales by size category using D3.js.
Responsibility: Sai Ritwik Reddy Nagireddy, Jaswanthi Chowdary Kalapala.
Contributions: 50% each.
Visualizing Market Preferences
Description: Distribution of product sales based on color categories Using Pie Chart.
Responsibility: Sai Teja Talluri, Swetha Vahana Reddy Morreppa.
Contributions: 50% each.
Cleaning Datasets Using Python and Visualization with Matplotlib:
Description: The dataset is cleaned by utilizing the Python programming
language and tools like Pandas and NumPy to handle missing values, outliers,
and inconsistencies.
Responsibility: All team members.
Contributions: 100% by each member.
Category-wise Product Distribution: Analyzing Market Segmentation:
Description: Analyzed population distribution of products across different
categories.
Responsibility: : Sai Ritwik Reddy Nagireddy, Jaswanthi Chowdary Kalapala.
Contributions: 50% each.
After Cleaning Dataset:
Description: Gives overview of current stock levels by category.
Responsibility: Sai Teja Talluri, Swetha Vahana Reddy Morreppa.
Contributions: 50% each.
Inventory Analysis:
Description: Analyzed distribution of Stock Levels by Size and top 10 Colors by
Stock Levels
Responsibility: : Sai Ritwik Reddy Nagireddy, Jaswanthi Chowdary Kalapala.
Contributions: 50% each.
Price and Weights Distribution Across Different Platforms:
Description: Gives overview of the distribution of prices for a specific platform.
Responsibility: Sai Teja Talluri, Swetha Vahana Reddy Morreppa.
Contributions: 50% each.
Correlation Matrix:
Description: Analyzed distribution of correlation coefficient between two variables
Responsibility: : Sai Ritwik Reddy Nagireddy, Jaswanthi Chowdary Kalapala.
Contributions: 50% each.
Overall Contributions:
Sai Ritwik Reddy Nagireddy-25%, Jaswanthi Chowdary Kalapala -25%, Sai Teja Talluri -
25%, Swetha Vahana Reddy Morreppa -25%
References:
1. Kaggle. (n.d.). Kaggle: Your Home for Data Science. Retrieved from
https://www.kaggle.com/
2. D3.js. (n.d.). D3.js - Data-Driven Documents. Retrieved from
https://d3js.org/
3. Python Software Foundation. (n.d.). Python. Retrieved from
https://www.python.org/
4. McKinney, W., Perktold, J., Seabold, S., & Wes McKinney. (2020). pandas-
dev/pandas: Pandas 1.2.4. Zenodo. https://doi.org/10.5281/zenodo.4611042